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We demostrate that the recently proposed interacting growth walk (IGW) model, modified for 
generating self-avoiding heteropolymers, proves to be a simpler alternative to the other Monte Carlo 
methods available in the literature for obtaining minimum energy conformations of lattice proteins. 
In fact, this simple growth algorithm seems to be capable of quickly leading to low energy states for 
all the three dimensional bench mark HP-sequences investigated. 
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Proteins are non-branching heteropolymers obtained 
from twenty commonly occurring amino acids. Their 
specific biological functions are intimately related to their 
'native' (i.e., unique and thermodynamically stable) con- 
formational structure. One of the most challenging prob- 
lems in Biophysics is to understand the kinetic mech- 
anism by which the information encoded in an amino 
acid sequence helps the protein grow or fold quickly into 
its native conformation [1]. Since the amino acids can 
be classified broadly into hydrophobic (H) or polar (P) 
types, the problem may be simplified by treating a pro- 
tein as a random copolymer made up only two types of 
amino acids with non-bonded contact interactions be- 
tween them. If we ignore the intra-molecular struc- 
tures of the individual amino acids (i.e., treat them as 
monomer units) then these interactions can be expressed 
in terms of a few effective parameters. Such a coarse- 
grained approach to this problem is justified by the obser- 
vation that the protein-folding mechanisms and rates are 
determined more by the topology of protein conforma- 
tions than by the details of interatomic interactions [2] . 
By assuming further that the individual bonds in the lin- 
ear chain of monomers are all of equal length and can only 
be along the directions of a regular lattice, we have the 
lattice protein models [3], called HP-models, which are 
amenable to exact enumeration as well as Monte Carlo 
studies. 

In this paper, we consider the problem of finding the 
native conformational structure that corresponds to a 
given HP-sequence and contact interactions. For short 
chains, the native state (or equivalently, the ground 
state) can easily be found by an exact enumeration of all 
possible conformations. However, for long chains, spe- 



cially designed Monte Carlo methods [4] have to be used 
for generating a large ensemble of conformations before 
a ground state search is performed. This ignores the 
phenomenology of chain growth and its conformational 
relaxation. 

The information for building the proteins is tran- 
scribed from the DNA to the mRNA, and translated into 
the corresponding amino acid sequences by a specific bio- 
chemical process - namely, as the ribosomes move along 
the raRNA strand from one codon to the next, differ- 
ent £RNA molecules lock into the appropriate places one 
after another, resulting in the transfer of the correspond- 
ing amino acids to the growing protein molecule. And as 
the protein grows, it coils into its secondary and tertiary 
structures as well. 

In this paper, we discuss an implementation of this 
growth scenario for any given HP-sequence using the re- 
cently proposed Interacting Growth Walk (IGW) model 
[5]. We demonstrate that this simple growth algorithm 
is capable of quickly finding the ground state conforma- 
tions for all the three dimensional bench mark sequences 
investigated [6,7]. 

Given a sequence of H and P monomers, say Sjj = 
{Aj — H or P; j = 0, 1, ...,N — 1}, we start growing a 
lattice protein by placing the first monomer Aq at an 
arbitrarily chosen site, i"o, of a regular d-dimensional lat- 
tice of coordination number z. The second monomer, 
A\, can be placed in any one of the z available nearest 
neighbour (NN) sites of ro, chosen at random. Let the 
walk be non-reversing so that we have a maximum of 
z — 1 sites to choose from for making any further step. 
Let {r™ | to = 1, 2, Zfc} be the 'unoccupied' NN sites 
available for making the fcth step of the walk. If the 
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number of available sites, Zk, is zero, then the walk can 
not grow further because it is geometrically 'trapped'. 
In this case, we clear the lattice and start a fresh walk 
from Tq again. If Zk ^ 0, the walk proceeds by choosing 
one of the Zk available sites at random with a probability 
defined as follows. 

Let ro™ be the number of non-bonded NN contacts 
which the kth monomer of type would make with the 
walk if it were placed at site r™. Clearly, < n™ < z — 1. 
Some of these contacts will be with the H type, and some 
of them will be the P type monomers. Let these num- 
bers be denoted by n™ kH and respectively. If the 
energies associated with an HH-contact, an HP-contact 
and a PP-contact are denoted by Ehh, £hp and epp 
respectively, then the cost of energy involved in plac- 
ing the kth monomer of type Ak at r™ is given by 
E™ — H eA k H+n™ P SA k p- The probability of choos- 
ing the site r m for the fcth monomer of type A^ may now 
be defined as, P£ = exp(-/^)/ E m exp(-/?i^J, 
where the summation is over all the Zk available sites, 
f3 = 1/kpT, kp is the Boltzmann constant and T the 
temperature. 

At 'infinite' temperature (/3 = 0), the walk is insen- 
sitive to the type of monomers being added and so is 
identical to the Kinetic Growth Walk (KGW) which in 
turn is in the same universality class as self-avoiding walk 
[5]. However, at zero temperature (j3 — oo), the walk 
will grow into a compact or an extended conformation 
depending on whether the local site-energies, E, are neg- 
ative or positive respectively. How compact a minimum 
energy conformation is going to be depends strongly on 
the fraction, x, of H type monomers in the walk - in 
the limit x — > 1, it is the most compact one, whereas in 
the other limit \ —> 0, it is identical to the KGW which 
is noncompact. In order to mimic the tendency of the 
H type monomers to minimize contact with the aqueous 
medium by forming clusters, it is customary to choose 
the contact energies, e = {£hh,£hp,£pp) = (—1,0,0), 
in lattice protein models. 

In this case, at T = 0, an H type monomer will be 
placed at a site with maximum HH-contacts whereas a 
P type monomer is insensitive to the type of NN con- 
tacts. We have schematically illustrated this in Fig. [jj. 
It is clear that the total energy associated with a con- 
formation is equal to the total number of HH-contacts 
in the walk. However, such a straightforward adap- 
tation of the IGW algorithm, hereafter referred to as 
the HP-IGW algorithm, does not automatically lead to 
ground state conformations corresponding to a given HP- 
sequence. Hence, we have further modified the HP-IGW 
algorithm so that it can be used in the multi-pass mode. 

The given HP-sequence, Sn, may be partitioned into 
M subsequences, S n ± , Sn 2 , ■ ■ ■ , $N M > without changing 
the order in which the letters appear in the original 
sequence. Clearly, Sn = Sn 1 U Sn 2 U ... U Sn m with 
N = Ni + N 2 + ... + Nm- We concentrate on the zero 



temperature case. 
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FIG. 1. Schematic illustration of HP-chain growth on a 
square lattice at T = 0. Hydrophobic and Polar monomers are 
represented by open and filled circles respectively. Suppose 
the next monomer to be added is of Hydrophobic type, (a) 
Site A will be chosen with probability 1 since it has a NN 
contact; (b) Sites A and B will be chosen at random with 
probability 1/2 since they both have a NN contact each; (c) 
Site B will be chosen with probability 1 since it has two NN 
contacts whereas site A has only one. 

In 'pass' 1, we use the HP-IGW algorithm to generate 
a large number of chains consisting of Ni monomers in 
the order, as well as of the type, specified by the first sub- 
sequence, ■ Out of all the conformations generated, 
we store only those with minimum energy, say —£\. Let 
M\ denote the ensemble as well as the number of all these 
minimum energy conformations. This part of the algo- 
rithm is not truely kinetic because it does not simulate 
the relaxation kinetics in the conformation space. Nev- 
ertheless, together with the chain growth algorithm, it 
quickly leads to the ground state conformations. 

In 'pass' 2, we take a chain, say C\ £ A/i, and use the 
HP-IGW algorithm to let it grow further by a chain seg- 
ment consisting of Ni monomers whose order and type 
are specified by the next subsequence, Sn 2 ■ Thus we ob- 
tain a chain consisting of N± + N 2 monomers whose type 
and order are specified by the subsequence, Sn x U Sn 2 . 
We use the same chain segment C\ again and again for 
generating a specified number of (N± + A^-monomcr 
chain segments. We repeat this for all the chain seg- 
ments, {Cj G iSjvj | j = 1, 2, Ni}. It is clear that 
all these chain conformations will have energy less than 
or equal to —£%. Let A/2 denote the ensemble as well 
as the number of (N% + A^-monomer conformatins with 
minimum energy, say —£<i. 

Similarly, in 'pass' 3, we generate an ensemble, A/3, of 
(Wi + A2 + A^-monomer conformations with minimum 
energy —£3, and so on. Thus, at the end of 'pass' M, 
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we have an ensemble of ./V-monomer chain conformations 
with minimum energy, —£(< —£m-i < ••• < — £2 < 
—£i). It may be noted that the order and type of the 
monomers in the chains generated are as specified by the 
original sequence, Sn- We have schematically illustrated 
this algorithm in Fig.||. We generate a large number of 
chain conformations at every stage, not for the purpose 
of statistics, but to make sure that we obtain as many 
distinct conformations as possible. This improves the 
chances of obtaining minimum energy states. Moreover, 
the optimal lengths of the subsequences (i.e., the opti- 
mal values of Nx,N%, A3 etc.,) for which we may get the 
'global' minimum are strongly dependent on the partic- 
ular sequence under consideration. 
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FIG. 2. Schematic illustration of the muti-pass algorithm 
described in the text. A/i, A/2 and A/3 are the ensembles of 
conformations with minimum energies, —£%, —£2 and —£3 at 
the end of passes 1,2 and 3 respectively. 

We have tested this multi-pass algorithm for all the 
ten bench mark sequences of Yue et al [6] and for two 
of Toma & Toma [7] . These sequences were designed by 
minimizing the energy of a particular target conforma- 
tion on a cubic lattice with the energy parameters given 
by e = (—1,0,0). We spent a maximum of thirty min- 
utes CPU time on a DEC Alpha Workstation for each of 
these bench mark sequences. We could obtain the ground 
state conformations for some of these sequences in just 
two passes - namely, the first halves of the chains in the 
first pass, and the second halves in the second pass - 
while for others, in three passes. We find that generat- 
ing the first segment of the chain from an arbitrary point 
in the sequence, rather than always from the beginning, 
ultimately leads to better ground states: (i) we occupy 
the site ro with the monomer of the type specified by the 
?i(l < n < A^i)th letter in the subsequence Sn x ; (H) the 
next monomer will then correspond either to the (n— l)th 
or to the (n-fT)th letter in the sequence with probability 
1/2, and so on. In other words, the chain will grow either 
to the left or to the right at random with equal proba- 
bility. However, the successfully grown chain will finally 
have monomers in the same order as well as of the type 



specified by the subsequence, Sn ± ■ The chain segments 
in subsequent passes will be grown always in the order in 
which they should grow. We handled these passes sepa- 
rately and manually, rather than as integrated modules 
in a single program. In Table [j], we have presented our 
results along with the ones reported in the literature. 

TABLE I. Ground state energies obtained for the 3d bench 
mark sequences. All the sequences of Yue et al. [6] are of 
length N = 48. The sequences 10 and 11 of Toma and Toma 
[7] are of lengths, N = 46 and N = respectively. The two 
energy values presented in the middle column are those of 
Yue et al and Bastolla et al [6] 



Ref. Seq. 


—Emm (reported) 


—E min (ours) 


Yue et al [6] 1 


31, 32 


31 


2 


32, 34 


32 


3 


31, 34 


32 


4 


30, 33 


30 


5 


30, 32 


30 


6 


30, 32 


30 


7 


31, 32 


31 


8 


31, 31 


30 


9 


31, 34 


31 


10 


33, 33 


31 


Toma & Toma [7] 10 


34 


33 


11 


42 


41 



In Fig. 3, we have shown the distributions of NN con- 
tacts corresponding to the three passes used for Sequence 
3 of Yue et al. Configurations of the first segment of 
length, iVi = 19, with maximum number of contacts 
= 9) were used in the second pass for growing 
the second segment of length, N2 = 17, whose minimum 
energy configurations (n max = 22) 
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FIG. 3. The distributions of NN contacts obtained in the 
three passes used for the Sequence 3 of Yue et al 
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were in turn used for growing the last segment of length, 
A3 = 12, in the third pass. The distributions, not nor- 
malized in any way with respect to each other, schemat- 
ically illustrate how the number of contacts in a typical 
configuration can be increased in a progressive manner. 

Thus, we have a simple but powerful growth algorithm 
which can be used for obtaining the ground state confor- 
mations of lattice proteins. A finite temperature version 
of this algorithm can be implemented in two ways. Since 
the temperature used in the chain growth process, say 
Tq, could be different from the temperature T at which 
the fully grown conformations are sampled, we may use 
the growth module HP-IGW either at Tq = or at 
Tq ^ 0. In the latter case, we may set Tq — T so as 
to have just one temperature for the entire process. Hav- 
ing generated a chain conformation of energy £, we may 
store it with a probability proportional to the Botzmann 
factor, exp(— (3£ ). This algorithm is similar to the simple 
enrichment scheme of Wall and Erpenbeck [8] for KGW 
at Tq = T — > 00. It is also likely that this finite temper- 
ature version is more efficient for a ground state search 
than the zero temperature version discussed in this pa- 
per. A detailed study of this algorithm and its variants 
will be reported elsewhere. 
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